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(54) A microprocessor with configurable on-chip memory 



(57) A processor structure and method of operation 
are disclosed that comprise a user-configurable on-chip 
program memory. The memory comprises an on-chip 
memory 31 and a program memory controller 30 that 
reconfigures memory 31 in response to control values 
that may be modified "by CPU core* 20 under program 
control. m one mode, memory.31 maybe mapped into 
internal address space. In other modes, memory 31 may 



be configured as an on-chip cache. In conjunction with 
the cache configuration, the program memory controller 
may comprise a tag RAM that is initialized upon a tran- 
sition to cache mode.. Program memory controller 30 
handles memory mode transitions and data requests; 
CPU core 20 preferably requests stored instructions 
from controller 30 in a uniform fashion regardless of 
memory mode., ' 
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Description 

FIELD OF THE INVENTION 

[0001] The present invention pertains generally to mi- 
croprocessor architectures, and pertains more particu- 
larly to microprocessors having on-chip program mem- 
ory capability. 

BACKGROUND OF THE INVENTION 

[0002] A microprocessor is a circuit that combines the 
instruction-handling, arithmetic, and logical operations 
of a computer on a single chip. A digital signal processor 
(DSP) is a microprocessor optimized to handle large vol- 
umes of data efficiently. Such processors are central to 
the operation of many of today's electronic products, 
such as high-speed modems, high-density disk drives, 
digital cellular phones, and complex automotive sys- 
tems, and will enable a wide variety of other digital sys- 
tems in the future. The demands placed upon DSPs in 
these environments continue to grow as consumers 
seek increased performance from their digital products. 
[0003] Designers have succeeded in increasing the 
performance of DSPs and microprocessors in general 
by increasing clock speeds, by removing architectural 
bottlenecks in circuit designs, by incorporating multiple 
execution units on a single processor circuit, and by de- 
veloping optimizing compilers that schedule operations 
to be executed by the processor in an efficient manner. 
As further increases in clock frequency become more 
difficult to achieve, designers have embraced the multi- 
ple execution unit processor as a means of achieving 
enhanced DSP performance. For example, Figure 2 
shows a block diagram of the CPU data paths of a DSP 
having eight execution units, L1, S1 , M1, D1, L2, S2, 
M2, and D2. These execution units operate in parallel 
to perform multiple operations, such as addition, multi- 
plication, addressing, logic functions, and data storage 
and retrieval, simultaneously. 

[0004] Theoretically, the performance of a multiple ex- 
ecution unit processor is proportional to the number of 
execution units available. However, utilization of this 
performance advantage depends on the efficient sched- 
uling of operations so that most of the execution units 
have a task to perform each clock cycle. Efficient sched- 
uling is particularly important for looped instructions, 
since in a typical runtime application the processor will 
spend the majority of its time in loop execution. 
[0005] Unfortunately, the inclusion of multiple execu- 
tion units also creates new architectural bottlenecks. In- 
creased functionality translates into longer instructions, 
such as may be found in very long instruction word 
(VLIW) architectures. For example, the eight-execution 
unit VLIW processor described above may require a 
256-bit instruction every clock cycle in order to perform 
tasks on all execution units. As it is generally neither 
practical nor desirable to provide, for example a 256-bit- 
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wide parallel data path external to the processor merely 
for instruction retrieval, the data rate available for load- 
ing instructions may become the overall limiting factor 
in many applications. It is an object of the teachings dis- 
5 closed herein to propose a solution to resolve this bot- 
tleneck. 

SUMMARY OF THE INVENTION 

10 [0006] Many high performance signal processors pro- 
vide at least some program memory on-chip because of 
the delays associated in loading instructions from exter- 
nal memory. However, the area on a microprocessor al- 
lotted for on-chip memory is by necessity limited, and 

is prior art on-chip memories provide no ability to recon- 
figure this limited and precious resource. The present 
teachings seeks to solve a heretofore unrecognized 
problem-given that the core functionality of some appli- 
cations can be loaded on-chip to a sufficiently-sized 

20 memory, while the core functionality of others cannot, 
can an on-chip memory be designed to meet the needs 
of either type of application, without duplicating and pos- 
sibly wasting resources? It has now been recognized 
that an on-chip memory that is configurable by the user, 

2S preferably in software, will provide the maximum flexi- 
bility for alt applications. The present teachings disclose 
a microprocessor with an on-chip memory that may be 
configured at runtime to one of several memory modes 
as requested by an application. 

30 [0007] In one aspect of the teachings, a microproces- 
sor is disclosed that comprises a configurable on-chip 
memory. Preferably, the microprocessor further com- 
prises a program memory controller that allows the cur- 
rent on-chip memory configuration to remain transpar- 

35 ent to the microprocessor central processing unit (CPU) 
core during program memory operations. Preferably, 
the configurable on-chip memory may be configured as 
either memory-mapped or cache memory. The cache 
memory may preferably be further configured to operate 

40 in multiple modes, e.g., fully enabled, bypassed, or 
read-only. 

[0008] In a second aspect of the teachings, the con- 
figurable on-chip memory may be reconfigured during 
microprocessor operation under software control. For 

45 instance, a configurable memory may be booted in one 
mode, and subsequently switched, once or multiple 
times, to other modes, by software commands executed 
by the CPU of the microprocessor. Such software com- 
mands preferably alter the operation of the program 

so memory controller and on-chip memory by changing a 
control signal on the microprocessor. 
[0009] In yet another aspect of the teachings, the pro- 
gram memory controller (PMC) operates in either a 
memory-mapped mode or a cache mode to determine 

55 if requested addresses are on-chip memory addresses. 
The program memory controller preferably supplies re- 
quested fetch packets if on-chip, or halts the processor 
and loads requested fetch packets from off-chip. The 
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PMC checks for requests lor memory mode transitions 
and initiates transitions when the CPU requests such. 
[001 0] In a further aspect of the teachings, a tag RAM 
is associated with cache memory operation. This tag 
RAM preferably operates in conjunction with the pro- 
gram memory controller, which determines if the fetch 
packet at the requested address is currently loaded into 
the cache. The program memory controller preferably 
has the capability to update the tag RAM when a fetch 
packet is loaded from off -chip. The program memory 
controller preferably also has the capability to re-initial- 
ize the tag RAM during microprocessor operation, e.g., 
due to a switch in memory configuration. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0011] The present invention will now be further de- 
scribed, by way of example, with reference to the ac- 
companying drawings in which: 

Figure 1 is a block diagram depicting the major func- 
tional blocks of a processor implementation . 
Figure 2 is a block diagram illustrating a configura- 
tion of execution units and registers of a multiple- 
execution unit processor; 

Figure 3 shows the arrangement of instructions in 
a fetch packet; 

Figures 4a and 4b show maps of processor address 
space for two different memory mappings; 
Figure 5 depicts instruction address partitioning for 
use as a cache address; 

Figure 6 "depicts the Interface between the CPU 
core and the program memory controller; 
Figure 7 illustrates the states and allowable state 
transitions for a program memory controller. 
Figure 8 shows the configuration of a status register 
that may be used to control a configurable memory ; 
and 

Figure 9 shows the registers and data paths of a 
program memory controller. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

[0012] Several illustrative embodiments are de- 
scribed herein . Although it is believed that the teachings 
disclosed herein may be readily adapted to virtually any 
CPU architecture, for illustrative purposes these em- 
bodiments are described with reference to a specific 
VLIW processor family, the Texas Instruments 
TMS320C6x. Those of ordinary skill in the pertinent art 
should comprehend the description beiow in sufficient 
detail to enable them to reproduce the invention; how- 
ever, for specific data related to processor architecture, 
instruction set, and operation, the interested reader is 
referred to the Texas Instruments TMS320C62xx CPU 
and Instruction Set Reference Guide (1997) and the 
Texas Instruments TMS320C62xx Peripherals Refer- 



ence Guide (1997), which are incorporated herein by 
reference. 

[001 3] Several definitions should also be useful to the 
reader. As used herein, an instruction is a function per- 

s formable by an execution unit on a processor in one or 
more cbck cycles. An execute packet is a set of one or 
more instructions that will be dispatched to the execu- 
tion units during the same clock cycle. A fetch packet \s 
a standard-sized block of instructions, comprising one 

to or more execute packets, that is loaded into the CPU as 
a single unit. 

[0014] A memory-mapped on-chip memory occupies 
a contiguous section of regularly addressable program 
memory. A cache on-chip memory contains a copy of 
instructions that also reside in external memory and that 
have been previously requested (usually those most re- 
cently requested) by the CPU. These do not necessarily 
represent a contiguous section of program memory, and 
are not generally explicitly addressable by the CPU. 

20 [0015] The Texas Instruments TMS320C6x (C6x) 
processor family comprises several preferred 
embodiments . The C6x family includes both scalar and 
floating-point architectures. The CPU core of these 
processors contains eight execution units, each of 

25 which requires a 31 -bit instruction. If all eight execution 
units of a processor are issued an instruction for a given 
clock cycle, the maximum instruction word length of 256 
bits (8 31 -bit instructions plus 8 bits indicating parallel 
sequencing) is required. 

30 [0016] A block diagram of a C6x processor connected 
to several external data systems is shown in Figure 1 . 
Processor 1 0 comprises a CPU core 20 in communica- 
tion with program memory controller 30 and data mem- 
ory controller 12. Other significant blocks of the proces- 

35 sor include peripherals 14, a peripheral bus controller 
17, and a DMA controller 18. 

[0017] Processor 1 0 is configured such that CPU core 
20 need not be concerned with whether data and in- 
structions requested from memory controllers 1 2 and 30 

40 actually reside on-chip or off-chip. If requested data re- 
sides on chip, controller 12 or 30 will retrieve the data 
from respective on-chip data memory 13 or program 
memory/cache 31 . If the requested data does not reside 
on-chip, these units request the data from external 

45 memory interface (EMIF) 16. EMIF 16 communicates 
with external data bus 70, which may be connected to 
external data storage units such as a disk 71 , ROM 72, 
or RAM 73. External data bus 70 is 32 bits wide. 
[0018] CPU core 20 includes two generally similar da- 

50 ta paths 24a and 24b, as shown in Figure 1 and detailed 
in Figure 2. The first path includes a shared multiport 
register file A and four execution units, including an 
arithmetic and load/store unit D1, an arithmetic and 
shifter unit S1, a multiplier Ml, and an arithmetic unit 

55 L1 . The second path includes register file B and execu- 
tion units L2, S2, M2, and D2. Capability (although lim- 
ited) exists for sharing data across these two data paths. 
[0019] Because CPU core 20 contains eight ex ecu - 
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tion units, instruction handling is an important function . 
Groups of instructions are requested by program fetch 
21 and received from program memory controller 30 as 
fetch packets. Instruction dispatch 22 distributes in- 
structions from fetch packets among the execution units 
as execute packets, and instruction decode 23 decodes 
the instructions. 

[0020] In the preferred embodiment, a fetch packet 
has a fixed length of eight instructions, as shown in Fig- 
ure 3. The execution grouping of the fetch packet is 
specified by the p*bit, bit zero, of each instruction. Fetch 
packets are eight- word aligned in program memory. 
[0021] The p-bit controls the parallel execution of in- 
structions. The p-bits are scanned from left to right (low- 
er to higher address) by instruction dispatch 22. If the 
p-bit of instruction / is 1 , then instruction /+t is to be ex- 
ecuted in parallel with instruction /, i.e., in the same ex- 
ecute packet. Thus an execute packet may contain from 
one to eight instructions, and a fetch packet may contain 
from one to eight execute packets, depending on the 
size of the execute packets. All instructions in an exe- 
cute packet must utilize a unique execution unit. An ex- 
ecute packet also cannot cross an eight-word boundary. 
Thus, the last p-bit in a fetch packet is always set to 0, 
and each fetch packet starts with a new execute packet. 
[0022] Because of this variable execute packet length 
and fixed fetch packet length, on-chip program memory 
31 in the preferred embodiment is aligned by fetch pack- 
ets. If an instruction that resides in the middle of a fetch 
packet is requested by the CPU, the entire fetch packet 
is retrieved, but all instructions at lower addresses are 
ignored (even if they woulcTBave otherwise~ope rated in 
parallel with the requested instruction); 
[0023] The physically addressable address space of 
the C6x processor is 4 Gbytes. On-chip program mem- 
ory 31 has a size of 64K bytes. However, each instruc- 
tion requires four bytes, and each fetch packet contains 
eight instructions, such that on-chip program memory 
31 is arranged as 2K frames, each frame holding one 
fetch packet of 32 bytes, or 256 bits, in length. In mem- 
ory map mode, the 64K bytes of on-chip memory may 
be selected to reside at a contiguous block of memory 
in address space starting at address 140 0000, as 
shown in Figure 4A, or at a starting address of 000 0000, 
as shown in Figure 4B. 

[0024] In cache mode, the representative embodi- 
ments assume that instructions will occupy a maximum 
external address space of 64 Mbytes. Thus the cache 
in these embodiments ignores the top six bits of an ad- 
dress in cache mode, as shown in Figure 5. The cache 
also ignores the bottom five bits of an address, as the 
cache stores only fetch-packet-aligned (i.e. 32-byte- 
aligned) data. Bits 5 to 25 of an instruction address are 
the only bits used to map external address space into 
cache locations. 

[0025] As shown in Figure 5, bits 5 to 25 are divided 
within the PMC into a ten-bit tag (bits 1 6-25) and an elev- 
en-bit block offset (bits 5-15). The program memory con- 



troller 30 contains a tag RAM 32 (see Figure 9) that is 
capable of storing 2K tags, one for each frame in mem- 
ory 31, in order to track the contents of the cache. The 
eleven-bit block offset is used both as an address for 

5 the appropriate tag within tag RAM 32 and as an ad- 
dress for the appropriate frame within memory 31 . Each 
eleven-bit location within tag RAM 32 contains a validity 
bit and a ten-bit tag. Although external addresses 64k 
apart map to the same location in the tag RAM, each 

io external address maps to a unique combination of block 
offset and tag. 

[0026] When the cache is initialized and enabled, the 
validity bit at each tag location is marked invalid. Then, 
as each new fetch packet is requested, its address is 

*5 partitioned within PMC 30 into a compare tag and a 
block offset. The block offset is used to retrieve a tag 
i from tag ram 32. If the tag validity bit is invalid, it is set 
and the compare tag is written into the tag RAM using 
the block address as an offset, and a cache miss is de- 

20 dared. If the tag validity bit of the retrieved tag is set, 
the retrieved tag is compared to the compare tag in tag 
comparator 34. If the two tags fail to match, a cache miss 
is declared and the compare tag is written into the tag 
RAM using the block address as an offset. If the two tags 

25 are identical, comparator 34 registers a cache hit and 
the tag ram is not modified. 

[0027] If a cache hit occurs, the requested fetch pack- 
et is retrieved from on-chip memory 31 using the block 
offset as an address. With a cache miss, the requested 

30 fetch packet is retrieved by sending the external address 
to EMIF 16 for off-chip retrieval. As the instructions of 
the fetch packet are received from EMI F 1 6 they are writ- 
ten into on-chip memory 31 one 32-bit instruction, at a 
time, using the block offset as an address. Once the en- 

35 tire fetch packet is received, it is sent to the CPU. 

[0028] Although the cache is typically fully enabled 
during caching, several other cache modes are availa- 
ble to the user. Cache freeze mode operates similar to 
cache enable mode, except that the cache and tag ram 

40 are never updated. This mode is useful for protecting 
valuable cache contents, e.g., during interrupt service. 
Cache bypass mode causes a cache miss on every 
fetch, effectively removing on-chip memory 31 from 
service. 

45 [0029] During processor operation, on-chip memory 
operations are preferably transparent to the CPU, such 
that program data requests and program data stores are 
handled in a uniform fashion. Referring now to Figure 6, 
the PMC and the CPU interface with a program address 

so bus 44, a program data bus 43, and several control sig- 
nals. The PROGRAM ADDRESS STROBE (PAS) signal 
is sent by the CPU when it places an instruction request 
on the program address bus. The PROGRAM DATA 
STROBE (PDS) signal is sent by the CPU when it needs 

55 program data (this typically occurs one to eight CPU cy- 
cles after the PAS signal is sent). The^ PROGRAM 
WRITE STROBE (PWS) signal is sent by the CPU when 
it desires to write data to program memory. The PMC 
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uses the RDY signal to acknowledge that it is supplying 
requested fetch packets as needed. The RDY signal is 
taken low to stall the CPU if the PMC cannot produce 
the program data when the PDS requests it. The RDY 
signal may also be taken low at other times, as de- 
scribed below. 

[0030] Figure 7 illustrates the states and allowable 
state transitions for the program memory controller of 
the C6x processor embodiment. These states may be 
divided generally into three categories as shown: mem- 
ory map states, cache states, and transition states. A 
description of each state and its corresponding state 
transition conditions follows. 

[0031] Referring again to Figure 7, RESET PMC is the 
boot state of the PMC. The- PMC typically stays in this 
state whenever the RESET pin of the processor is as- 
serted. However, the PMC may transition to a BOOT 
LOAD state from RESET PMC if the DMA provides a 
request during RESET During BOOT LOAD, the DMA 
may store data into the on-chip memory. Once the DMA 
request has been serviced in BOOT LOAD, the PMC 
transitions back to RESET PMC. 

[0032] Upon release of RESET, the PMC transitions 
to memory map mode and the FETCH RUN state. 
FETCH RUN is the default state of the PMC in memory 
map mode. The PMC idles in this state until a request 
is received. If the CPU has requested a fetch packet by 
asserting PAS, the PMC determines if the address on 
bus 44 is an on-chip memory address. If the address is 
an on-chip address, the requested fetch packet is 
placed on the program data bus. If the address is an off- 
chip address, the PMC sends the address-to the EMIF 
for program data retrieval. 

[0033] The PMC transitions from FETCH RUN to 
FETCH STALL if the requested fetch packet has not 
been retrieved before the CPU indicates it needs the da- 
ta by asserting PDS (typically one to eight clock cycles 
after the CPU asserts PAS). In FETCH STALL, the PMC 
halts the CPU by deasserting the RDY signal until the 
requested fetch packet has been received. Once the 
PMC retrieves the fetch packet, the PMC transitions 
back to FETCH RUN and RDY is reasserted. 
[0034] The PMC may also transition from FETCH 
RUN to WRITE ON CHIP if a store program (STP) in- 
struction is executed by the CPU. The STP instruction 
causes the CPU to assert PWS, indicating to the PMC 
that an instruction write is requested. In WRITE ON 
CHIP, the program address on address bus 44 is eval- 
uated by the PMC; if it is a valid on-chip address, the 
instruction on program data bus 43 is written into on- 
chip memory 31 and the PMC transitions back to 
FETCH RUN. If the address is an off-chip address, the 
PMC transitions to WRITE OFF CHIP. In either case, 
WRITE ON CHIP is a one-cycle state. RDY is deassert- 
ed in this state. 

[0035] The WRITE OFF CHIP state is only entered 
from WRITE ON CHIP, and RDY remains deasserted in 
this state. WRITE OFF CHIP passes the instruction ad- 



dress and data to the EMIF for writing. The PMC re- 
mains in this state until the EMIF has written the data, 
and then transitions back to FETCH RUN. 
[0036] The final memory mode state is DMA RE- 

5 QUEST The DMA can write to on-chip memory during 
this one-cycle state. However, the CPU is given priority 
over the DMA, and no transition from FETCH RUN to 
DMA REQUEST will occur as long as the CPU has 
pending requests. Note also that no corresponding state 

io exists for cache operation— as the cache stores a copy 
of off-chip memory, the results of a write only to on-chip 
cache would be unstable. Thus, DMA requests in cache 
mode are ignored. As an alternative, the DMA request 
could be handled similar to STP requests in cache mode 

is (see the CACHE WRITE state below). 

[0037] The P MC has a separate set of states for mem- 
ory and cache modes, although functional similarities 
exist between the two modes. The resting cache mode 
state is STROBE WAIT RUN; the PMC returns to this 

20 state when there are no pending fetches, and remains 
in this state until the CPU asserts PAS or PWS. 
[0038] When the CPU asserts PAS, the PMC transi- 
tions to HIT RUN. In this state, the PMC determines if 
the cache contains a valid replica of the requested fetch 

2S packet. If it does, a cache hit is declared and the packet 
is returned from the cache, and the PMC transitions 
back to STROBE WAIT RUN unless another request is 
pending. If the requested fetch packet is not in the 
cache, the PMC declares a miss and transitions to Ml SS 

30 RUN. RDY remains asserted in HIT RUN. 

[0039] In MISS RUN, RDY remains asserted as the 
- PMC fetches the requested packet from off-chip via the 
EMIF. In this state, if the cache is fully enabled the tag 
RAM will be updated and the packet will be written into 

3S the corresponding cache location as it is received from 
off-chip. The PMC remains in MISS RUN until the entire 
packet is fetched, unless the CPU requests the fetch 
packet data before the fetch is completed, in which case 
a transition to MISS STALL occurs. Once the fetch is 

40 completed, the PMC may transition back to STROBE 
WAIT RUN if no further requests are pending, to HIT 
RUN if an in-cache request is pending, or remain in 
MISS RUN if an off-chip request is pending. 
[0040] , If the CPU requests off-chip data before it has 

45 been completely retrieved, the PMC transitions to MISS 
STALL, deasserts RDY, and stalls the CPU until the 
fetch has completed. Once the off-chip fetch is complet- 
ed, the PMC transitions to MISS RUN if an additional 
off-chip request is pending; otherwise, it transitions to 

so HIT RUN. 

[0041] The PMC may also transition from STROBE 
WAIT RUN, HIT RUN, or MISS STALL to CACHE 
WRITE if the CPU asserts the PWS signal (the transition 
occurs after pending fetch requests are completed). In 

ss CACHE WRITE, the CPU is stalled by deasserting RDY, 
and the data on program data bus 43 is written to the 
physical off-chip address appearing on program ad- 
dress bus 44. In this state, the tag associated with this 
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address is cleared in the tag RAM. One alternative to 
clearing the tag would be to update the tag RAM and 
on-chip memory after writing the new value into off -chip 
memory. 

[0042] Although the C6x has been designed to always 
boot the on-chip memory in memory map mode, one of 
the key features of the disclosed teachings is the ability 
to reconfigure on-chip memory during processor oper- 
ation. Although this could be done with an externally- 
supplied signal, in the preferred embodiment the CPU 
controls the mode of on-chip memory. As illustrated in 
Figure 8, the C6x CPU Control Status Register (CSR) 
contains a PCC field that indicates the desired program 
memory mode, and is observable by the program mem- 
ory controller. In the C6x, the PCC is implemented as a 
three-bit field with four valid values (the other four are" 
reserved for future implementation of additional modes). 
PCC value 000 represents memory mapped mode, and 
is the reset state. PCC value 010 represents cache en- 
abled mode. PCC value 011 represents cache freeze 
mode, where cache contents are retained and readable, 
but off-chip reads do not affect the cache. And PCC val- 
ue 100 represents cache bypass mode, which essen- 
tially bypasses on-chip memory and forces all reads to 
come from off-chip. 

[0043] The user may select a PCC value that provides 
best performance for an application or portion of an ap- 
plication then executing on the processor. The user typ- 
ically changes the PCC value by reading the CSR, mod- 
ifying the PCC field, and writing the modified contents 
back into the CSR. From the standpoint of the PMC state 
machine, the most -significant'* PCC 'events' are transi- 
tions between the memory map state. and one of the 
cache states. 

[0044] While in memory map mode, the PMC checks 
the value of PCC in FETCH RUN and FETCH STALL 
states. If the PCC changes to a cache state, after the 
current fetch request is completed the PMC will transi- 
tion to MEM TO CACHE. MEM TO CACHE stalls the 
CPU while it initializes tag RAM 32 by clearing the valid 
bit associated with each tag. Although different imple- 
mentations are possible, the C6x clears the bits one tag 
per clock cycle. The PMC in the C6x remains in MEM 
TO CACHE for 2049 clock cycles, 2048 of these being 
required to clear the 2K tags in the tag RAM. 
[0045] If no fetch requests were pending at the tran- 
sition to MEM TO CACHE, the PMC transitions to 
STROBE WAIT RUN in cache mode after initializing the 
tag RAM. If a request was pending, the PMC transitions 
instead to MISS STALL 

[0046] The PMC performs a similar check of PCC in 
cache mode. However, it will not transition to memory 
map mode until a cache miss occurs, i.e., transitions to 
the CACHE TO MEM state occur from the MISS RUN 
and MISS STALL states. In CACHE TO MEM, the PMC 
stalls the CPU. CACHE TO MEM clears up any pending 
fetch requests and then transitions to FETCH RUN in 
memory map mode. 



[0047] In this embodiment, the PMC takes no action 
with regard to the on-chip memory upon transition from 
cache to memory map mode. Thus the user is respon- 
sible for insuring that the memory-map contents are not 
$ used without proper initialization. Other embodiments of 
CACHE TO MEM are possible, such as one that fills on- 
chip memory from a specified location in off-chip mem- 
ory before transitioning to memory-map mode. 
[0048] The registers and data paths through the PMC 
70 are illustrated in Figure 9. Because the CPU core 20 is 
allowed to request a second fetch packet before it is 
ready to receive a first, two pipelined address registers 
35 and 36 are used to handle multiple fetch requests. 
Likewise, both requests may be serviced (typically if 
75 both are on-chip) before CPU core 20 is ready for data, 
thus two pipelined data registers 37 and 38 are used to 
sequence retrieved data. Write data register 39 and 
write address register 40 are dedicated for program 
stores. Counter 41 is used for initializing tag ram 32, e. 
g. in the MEM TO CACHE state. Figure 9 further illus- 
trates how these registers are interconnected, and how 
the various data paths may be multiplexed to implement 
the functionality described in conjunction with Figure 7. 
[0049] Although the invention has been described 
herein with reference to a specific processor architec- 
ture, it is recognized that one of ordinary skill can readily 
adapt the described embodiments to operate on other 
processors, regardless of instruction size, on-chip or off- 
chip memory size, bus size, or utilization of instruction 
pipelining. Likewise, nothing in this description should 
be seen as limiting the possible memory modes of a 
processor employing a user-configurable memory. For 
instance, other modes such as explicit boot modes,, oth- 
er known caching modes, and partitioned on-chip 
modes (multiple cache or part-mapped/part-cache) may 
be implemented using this disclosure. And although the 
preferred embodiments have been described using a 
specific controller design, hose of ordinary skill will rec- 
ognize upon reading this disclosure that the basic idea 
of a configurable on-chip memory may be logically im- 
plemented in many equivalent designs. Other obvious 
modifications will be apparent to those of ordinary skill 
in the art upon reading this disclosure. 
[0050] The scope of the present disclosure includes 
any novel feature or combination of features disclosed 
therein either explicitly or implicitly or any generalisation 
thereof irrespective of whether or not it relates to the 
claimed invention or mitigates any or all of the problems 
addressed by the present invention. The applicant here- 
by gives notice that new claims may be formulated to 
such features during the prosecution of this application 
or of any such further application derived therefrom. In 
particular, with reference to the appended claims, fea- 
tures from dependent claims may be combined with 
those of the independent claims and features from re- 
spective independent claims may be combined in any 
appropriate manner and not merely in the specific com- 
binations enumerated in the claims. 
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Claims 

1. A microprocessor comprising: 

a central processing unit; and 5 
an on-chip memory for storing instructions ex- 
ecutable on said central processing unit, said 
on-chip memory system having a plurality of se- 
lectable configurations. 

10 

2. The microprocessor of Claim 1, wherein said plu- 
rality of selectable configurations comprise both a 
memory map configuration and a cache configura- 
tion. 

15 

3. The microprocessor of Claim 1 orClaim 2, wherein 
said selectable configurations are selectable only 
during microprocessor boot and reset operations. 

4. The microprocessor ot Claim 1 or Claim 2, wherein 20 
said selectable configurations are selectable by the 
central processing unit during microprocessor op- 
eration. 

5. The microprocessor of any of Claims 1 to 4, wherein 2s 
said on-chip memory comprises a program memory 
controller arranged for communication with said on- 

. chip memory . 

6. The microprocessor of Claim 5, wherein said on- -30 
chip memory has a fixed configuration and said pro- 
gram memory cohtroireThas a plurality of operation- 
al modes, and wherein selection of one of said se- 
lectable configurations of said on-chip memory 
comprises selecting the operational mode of said 35 
program memory controller. 

7. The microprocessor of Claim 6, wherein said oper- 
ational modes comprise a memory map mode and 

a cache enabled mode. . 40 

8. The microprocessor of any of Claims 5 to 7, wherein 
said on-chip memory further comprises a tag mem- 
ory array controllable by said program memory con- 
troller for storing information pertaining to the con- 45 
tents of said on-chip memory array. 

9. A microprocessor comprising: 

a central processing unit; 50 
an on-chip memory for storing instructions ex- 
ecutable on said central processing unit; 
an external memory interface capable of read- 
ing from and writing to an off-chip memory in- 
structions executable on said central process- ss 
ing unit; and 

a configurable program memory controller ar- 
ranged for communication with said central 



processing unit, said on-chip memory array, 
and said external memory interface, said con- 
figurable program memory controller having a 
plurality of operating modes, including a first 
mode in which it uses said on-chip memory as 
a memory-mapped on-chip memory, and a sec- 
ond mode in which it uses said on-chip memory 
as a cache on-chip memory. 
10. The microprocessor of Claim 9, wherein 
said microprocessor is arranged such that dur- 
ing operation said central processing unit can 
request said program memory controller to 
switch from its current operating mode to a dif- 
ferent operating mode, and wherein said pro- 
gram memory controller is capable of reconfig- 
uring itself during microprocessor operation in 
response to said request. 
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